A double smoothing technique for solving unconstrained 
nondifferentiable convex optimization problems 

Radu loan Bot- * Christopher Hendrich t 

February 25, 2013 

O 
(N 

Abstract. The aim of this paper is to develop an efficient algorithm for solving a 
class of unconstrained nondifferentiable convex optimization problems in finite dimen- 
sional spaces. To this end we formulate first its Fenchel dual problem and regularize it 
in two steps into a differentiable strongly convex one with Lipschitz continuous gradient. 
The doubly regularized dual problem is then solved via a fast gradient method with the 
\^J aim of accelerating the resulting convergence scheme. The theoretical results are finally 

^J applied to an l\ regularization problem arising in image processing. 
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1 Introduction 

In this paper we are interested in solving a specific class of unconstrained convex opti- 
mization problems in finite dimensional spaces. Generally, when characterizing optimal- 
ly, the convexity allows to make use of powerful results in convex analysis, separation 
theorems and the (Fenchel) conjugate theory here included (see [I 15 16]). In convex 
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optimization these are the ingredients for assigning a dual optimization problem via the 
perturbation approach to a primal one. When strong duality holds, solving the dual 
problem instead is a natural way to obtain an optimal solution to the primal prob- 
j> lem, too. As weak duality is always fulfilled, for guaranteeing strong duality, so-called 

regularity conditions are needed (see, for example, |5,6,16|). 

When considering an unconstrained convex and differentiable minimization problem, 
there are already plenty of promising methods available (such as the steepest descent 
method, Newton's method or, in an appropriate setting, fast gradient methods, see |11| ) 
for solving it. However, a lot of situations occur when the objective function of the opti- 
mization problem to be solved is nondifferentiable. Therefore, the convex subdifferential 
is used instead, not only as a tool for theoretically characterizing optimality, but also as 
the counterpart of the gradient in different numerical methods. However, the classical 
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methods which solve unconstrained convex and nondifferentiable minimization problems 
have a rather slow convergence. 

The aim of this paper is to develop in finite dimensional spaces an efficient algorithm 
for solving an unconstrained optimization problem having as objective the sum of a con- 
vex function with the composition of another convex function with a linear operator. To 
this end we are not relying on subgradient schemes, since their complexity can not be 
better than O (j^) iterations, where e > is the desired accuracy for the objective value 
(see [ll]). Instead, we show that it is possible to solve the corresponding Fenchel dual 
problem efficiently and to reconstruct in this way an approximately optimal solution 
to the primal one. To this end we make use of a double smoothing technique, in fact 
a generalization of the double smoothing approach employed by Devolder, Glineur and 
Nesterov in [8] and [9] for a special class of convex constrained optimization problems. 
This technique makes use of the structure of the dual problem and assumes the regular- 
ization of its objective function into a differentiable strongly convex one with Lipschitz 
continuous gradient. The regularized dual is then solved by a fast gradient method and 
this gives rise to a sequence of dual variables which solve the non-regularized dual ob- 
jective in O ( - In ( - ) J iterations. In addition, the norm of the gradient of the objective 
of the regularized dual decreases by the same rate of convergence, a fact which is crucial 
in view of reconstructing an approximately optimal solution to the primal optimization 
problem. 

The structure of the paper is the following. In the forthcoming section we intro- 
duce the class of convex optimization problems which we deal with throughout this 
paper, provide its Fenchel dual optimization problem and discuss some duality issues. 
In Section p^ we apply the smoothing technique introduced in |12f]14| to the dual ob- 
jective function in order to make it strongly convex and differentiable with Lipschitz 
continuous gradient. In Section [4] the regularized dual problem is solved via an efficient 
fast gradient method. Additionally, we investigate the convergence of the dual iterates 
to an optimal dual solution with a given accuracy and show how to reconstruct from 
it an approximately optimal primal solution. Finally, in Section [5] an l\ regularized 
linear inverse problem is solved via the presented approach and an application in image 
processing is discussed. 

2 Preliminaries and problem formulation 



In the following we are considering the space 1" endowed with the the Euclidean topol- 
ogy, i.e. ||x|| = \J{x,x) = Vx T x for all x G M n . By l n we denote the vector in W 1 
with all entries equal to 1 . For a subset C of M. n we denote by cl C and ri C its closure 
and relative interior, respectively. The indicator function of the set C is the function 
5 C : M n —• 1 := EU{±oo} defined by 5 c {x) = for x G C and 5 c {x) = +oo, otherwise. 
For a function / : M. n ->Rwe denote by dom / := {x G R n : f{x) < +00} its effective 
domain. We call / proper if dom/ 7^ and f{x) > -co for all x G W 1 . The conjugate 
function of / is /* : R n ->• 1, f*(p) = sup{(p,x) - f(x) : x G M n } for all p G R n . The 
biconjugate function of / is /** : W l — > M, f**(x) = sup{(x,p) — f*(p) : p G W 1 } and, 
when / is proper, convex and lower semi continuous, according to the Fenchel- Moreau 
Theorem, one has / = /**. The (convex) subdifferential of the function / at x G M. n 



is the set df(x) = {p G W l : f(y) - f(x) > p T (y - x) Vy G M n }, if f(x) G M, and is 
taken to be the empty set, otherwise. For a linear operator A : IR n — > M. m , the operator 
A* : M. m — > M. n is the adjoint operator of A and is defined by (A*y, x) = (y, Ax) for all 
x G W l and all y G M m . 

For a nonempty, convex and closed set C C M n we consider the projection op- 
erator Vc '■ H^ n - ► C defined asi 4 argmin zeC . ||x — z||. Having two proper func- 
tions /, y : M ra — > M, their infimal convolution is defined by /Dy : R n — > R, 
(/Dy)(x) = infygiRn {/(y) + ^(x — y)} for all a; € M n . The Moreau envelope of the 
function / : W 1 — > R of parameter 7 > is defined as the infimal convolution 

V(x) := /□ (^ INI 2 ) (x) = inf n {/(») + ^||x - y|| 2 } Vx G R». 

We say that the function / : M n — >■ R is strongly convex with parameter p > if for all 
x, y G IP and all A G (0, 1) it holds 

f(Xx + (1 - A)y) < Xf(x) + (1 - A)/(y) - |a(1 - A)||x - y|| 2 . 

In this work we are dealing with optimization problems of the type 

(P) M{f(x) + g(Ax)}, (1) 

x&R n 

where / : W 1 —> R and y : M m — >■ R are proper, convex and lower semicontinuous 
functions and A : M n — > M m is a linear operator fulfilling v4(dom /) n dom j / 8. 
Furthermore, we assume that dom / and dom g are bounded. 

Remark 1. The assumption that dom/ and domy are bounded can be weakened in 
the sense that it is sufficient to assume that dom / is bounded. In this situation, in the 
formulation of (P) the function g can be replaced by g + ^ c i(A(dom/))) which is a proper, 
convex and lower semicontinuous function with bounded effective domain. 

On the other hand, one should also notice that the counterparts of the assumptions 
considered in J8][9] in our setting would ask for closedness for the effective domains of 
the functions / and g, too. However, we will be able to employ the double smoothing 
technique for (P) without being obliged to impose this assumption. 



According to |5l|6l , the Fenchel dual problem to (P) is nothing else than 

(D) sup {-f*(A*p)-g*(-p)}, (2) 



where /* : M n — > R and g* : M m — > R denote the conjugate functions of / and g, 
respectively. We denote the optimal objective values of the optimization problems (P) 
and (D) by v(P) and v(D), respectively. 

The conjugate functions of / and g can be written as 



and 



/*(<?)= sup {(?,*> -/(*)} = - ipf {<-<?,*> + /(*)} VgeR" 

zSdom/ xGdomf 



g*{p)= sup {{p, x) - g(x)} = - inf {(-p,x) + g(x)} Mp G 

xedom 9 xedomg 



respectively. In the framework considered above, according to |4, Proposition A. 8], 
the optimization problems arising in the formulation of f*(q) for all q £ M. n and g*{p) 
for all p G W 71 are solvable, fact which implies that dom/* = M n and domg* = R m , 
respectively. 

By writing the dual problem (D) equivalently as the infimum optimization problem 

MjnA* P ) +9 *(- P ) } , 

one can easily see that the Fenchel dual problem of the latter is 

sup {-/**(*) -g**(Ax)}, 

x£R n 

which, by the Fenchel-Moreau Theorem, is nothing else than 

sup {-f(x) -g(Ax)}. 

In order to guarantee strong duality for this primal-dual pair it is sufficient to ensure 
that (see, for instance, pj) € h(A* (dom g*) + dom/*). As /* has full domain, this 
regularity condition is automatically fulfilled, which means that v{D) = v{P) and the 
primal optimization problem (P) has an optimal solution. Due to the fact that / and g 
are proper and A(dom/) n doing 7^ 0, this further implies v{D) = v{P) £ R. Later we 
will assume that the dual problem (D) has an optimal solution, too, and that an upper 
bound of its norm is known. 

Denote by 9 : R m ->• R, 6{p) = f*{A*p) + g*(-p), the objective function of (D). 
Hence, the latter can be equivalently written as 

(D) - inf 9(p). (3) 



Since in general we can neither guarantee the smoothness of p h- >■ f*(A*p) nor of p 1— )■ 
g*(—p), the dual problem (D) is a nondifferentiable convex optimization problem. Our 
goal is to solve this problem efficiently and to obtain from here an optimal solution 
to (P). To this end, we are not relying on subgradient-type schemes, due to their 
slow rates of convergence equal to O ( -^ J , but we are applying instead some smoothing 



techniques introduced in [12-14 . More precisely, we regularize first the functions p 1— >■ 
f*(A*p) and p >->■ g*(—p), by taking into account the definitions of the two conjugates, 
in order to obtain a smooth approximation of the objective of Q with a Lipschitz 
continuous gradient. Then we solve the regularized dual problem by making use of a fast 
gradient method (see 1 13 ) and generate in this way a sequence of dual variables which 
approximately solves the problem (D) with a rate of convergence of O ( - J . Since similar 
properties cannot be ensured for the primal optimization problem (P), the solving of 
this problem being actually our goal, we apply a second regularization to the objective 
function of Q . This will allow us to make use of a fast gradient method for smooth and 
strongly convex functions given in |TT] for solving the regularized dual, which implicitly 
will solve both the dual problem {D) and the primal problem (P) approximately in 
iterations. 



°(Hi 



3 The double smoothing approach 

3.1 First smoothing 

For a positive real number p > the function p i->- f*(A*p) = sup xeK n {(A*p, x) — f(x)} 
can be approximated by 



f* p (A*p)= sup \(A*p,x)-f(x) 



^llxll 2 
2 " " 



(4) 



while, given p > 0, the function p i-> g*(—p) = sup x£K n {{—p, x) — g(x)} can be approx- 
imated by 

9u(-p) = su p { (-p. x > - 0(3) - o INI 2 r • ( 5 ) 

xeK m I * J 

For each p £ M m the maximization problems which occur in the formulations of f*{A*p) 
and gl{—p) have unique solution (see, for instance, 111 Proposition A. 8 and Proposition 
B.10]), since their objectives are proper, strongly concave (see (ToJ Proposition B.l.1.2]) 
and upper semicontinuous functions. 

In order to determine the gradient of the functions p i— >■ f*{A*p) and p \- > g*(—p), 
we are going to make use of the Moreau envelope of the functions / and g, respectively. 
Indeed, for all p & M. m we have 



-f(A*p) = - sup \(A*p,x) - f(x) - 

= i n R f 4" (A>,x)+/(x)+ 



Pn ,,2 

2 M 

P., „2 

2 lM 



inf <^/(x) + ^ 



A*p 



|A*p|| 

^7 






\A*p\\ 



As the Moreau envelope is continuously differentiable (see [Tl Proposition 12.29]), p \- > 
—f*(A*p) is continuously differentiable, as well, and it holds for all p £ R m 



X7(t* A*\< \ A T7 l -t( A *P\ AA *P A ( ( A *P 

-V(/ p o A )(p) = - V pfl = - [p[ .>■ 

F p V p J p p V V p 



p.p 



AA*p 
P 



-Ax 



p,p> 



which means that 



V{t p oA*)(p) = Ax PtP , 



where x p , p G W 1 is the proximal point of parameter - of / at — -, namely the unique 
element in M n fulfilling 



'/ 



A*p 



f(x P ,p) + 



P 



A*p 



b p,p 



By taking into account the nonexpansiveness of the proximal point mapping (see (11 
Proposition 12.27]), for p, q £ R m it holds 



V(/!oi*)(p)-V(/!oi*)(g) 



]**■•& p,p ^■•Ep,q\\ — 1 1 "II ll"^p,p ^P,Q\ 



< \\A\ 



A*p A* i 



< 



\A\\ 



\P-Q\\ 



thus 



Mil 



is the Lipschitz constant of p h-> V(/* o A*)(p). 
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For the function p h- >■ g*(—p) one can proceed analogously. For all p £ M m one has 



-gl(-p) = inf u(a;) + 



/l 


P 




— 


— 


- ;/; 


2 


A* 





2/./ 



llPlI 



2/i ' 



1 i / P 



P 
/i 



1 ( ( P 



/(.p 



P 

— = x 



m,p- 



which is a continuously differentiable function such that 

-v^(-Xp) 

thus, 

V^( — )(P) = -«A«J»' 

where ov,, £ M m is the proximal point of parameter - of g at — -, namely the unique 
element in W 11 fulfilling 



~)=9M + % 



For p, g £ R m it holds 

V^(-)(p) - V«£(-)(g) 



■£/U,p ~r "^/i,gl 



/i 



< 



:r 



/'■/' 



p 




q 




+ 




V 




/i 



< 



fi 



-p + q\\, 



so that - is the Lipschitz constant of p *- > X7g*(—-)(p). 



Remark 2. If / is strongly convex with parameter p > 0, there is no need to apply 
the first regularization for p i— >■ f*(A*p), as this function is already differentiable with 

a Lipschitz continuous gradient having a Lipschitz constant given by - — — . The same 
applies for p t- > g*(—p), if g is strongly convex with parameter p, > 0, in this case the 
Lipschitz constant of its gradient being given by - . 

The constants Df := sup < -^- : x £ dom / > and D g : = sup < -^j- : x £ doing > will 
play an important role in the upcoming convergence schemes. Since dom/ and dom g 
are bounded, Df and D g are real numbers. 

Proposition 3. For all p £ R m it holds 

r p (A*p) < f*(A*p) < r p {A*p) + pD f and g*(-p) < g*(-p) < <£(-p) + pD g . 
Proof. For p £ M m one has 

f* p (A*p) = (A*p,x p , p ) - f(x PyP ) - P 2 \\x p J 2 < (A*p,x PyP ) - f(x p , p ) < f*(A*p) 



< sup I (A*p,x) - f(x) 



P II II 2 1 , P I, ,,2 

-\\x\\ 1 + sup <-\\x\\ 

^ J zedom/ I ^ 



r P (A*p)+ P D f . 



The other estimates follow similarly. 
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For p > and p > let be Ptfi : M m -»■ R denned by PlM (p) = /*(^4*p) + $(-p). 
The function #p iP is differentiable with a Lipschitz continuous gradient 

V0 Pi » = V(/; o A*)(p) + V^(-)(p) = Ax p , p - x^ p 

having as Lipschitz constant L(p, p) := - — - — \- -. 

Summing up the inequalities from Proposition [3] we get 

e P M < e(p) < e P:P ( P ) + p d } + pD g v P g R m . (6) 

Further, for p E W 71 we have 

9 p ^ P ) = f;(A*p) + 9 ;(-p) 

= \Pi A-X P)P ) — jyXp^p) — — \\Xp t p\\ — \p,x Pt p) — g{x Pj p) — — \\x Pj p\\ 
and from here 

f(x P ,p) + g(x p ,p) - v(D) = (p,V0 P)Al (p)) + (-v(D) - PlM (p)) - | \\x PiP \\ 2 - - \\x^p\\ 2 . 
Thus 

\f(x p , p ) + g(x p , p ) - v(D)\ < \(p, V0 p ,»}| + \v(D) + p ,»| + pD, + pD g . (7) 

Since x;(P) > v{D) (weak duality) and \0 p ^(p) + v{D)\ < |0(p) + v(D) !+/?£>/ +/i£> g , 
we conclude that 

/(x p>p )+ 5 (x MiP )-t;(P) < |(p,V0 P)Al (p)>| + \e{p)+v{D)\+2pD f + 2pD g . (8) 

Following the ideas in [8] , we further consider for the regularized optimization problem 
(for p > and p > 0) 

s£Aa.(p) ( 9 ) 



the following fast gradient scheme (see |13| scheme (3.11)]): 



Init.: Choose wo G M m and set fc := 0. 
For k > : Compute 9 Pt p,(wk) and V9 P)Pj {wk)- 

Findp fc = argmiiJ (V0 p ^(w k ),w - w k ) H ^ — ||w-u;fc|| 2 L 



id en 

2 



Find Zfc = argmin< L(p, p) \\wq — w 



8=0 

Q+ 2 fc + 1 

Set w k+1 := -r—^zu + -——:Pk- 
k + 6 k + 3 



_i_ i i 



Assuming that p* s E W 71 is an optimal solution of (pj, it follows that V# PiAt (pg) = 0. 
Thus, due to the properties of the above convergence scheme provided in [13], we have 

«, M -»,,M)< iL {^ k - + f v*>a (io, 

When p* G R m is an optimal solution to (D), from (pj) we get that 6p ;fJ ,(p k ) > 0(pfc) — 
/9l?j - pD g for all /c > and PtP (p* s ) < 6 p , p {p*) < 6{p*) = —v(D). Hence, we obtain 

OpAPk) ~ P ,M) > &ipk) - pDf - fiD g + v{D), 
which further implies that 

9{p k ) + v(D) < e p ,,(p k ) - e p ,M) + P D f + fiD g 9 ^fj^llzf + p d s + ^ 

for all fc > 0. Now, in order to guarantee 6(p k ) +v(D) < e, namely that p k is a solution 
of the dual problem (D) with e-accuracy, we can force all three terms in the above 
inequality to be less than or equal to |. By taking 

P ■= P(e) = ^ and M := M (e) = ^ 

this means that the amount of iterations k needed in order to satisfy e-optimality for 
the dual iterate depends on the relation 

AL(p,p)\\p -p* s \\ 2 < e 
(fc + l)(fc + 2) "3' 



Since the Lipschitz constant L(p,p) = - — - — I — is of order -, the rate of convergence 

for 0( Pk ) + v{D) <eisC>Q). 

Further, according to (pj, in order to gain an accuracy for the primal optimization 
problem proportional to e > 0, one has only to ensure that \(p k , V6 Pifi (pk))\ is lower 
than or equal to 0(e). However, by jll, Theorem 2.1.5], we have 



\\vo P M\\ 2 < 2L(p,p)(e p , p ( Pk ) - o P M)\ 

hence, from (10), 

" PW J " - y/(k + l)(k + 2) 

This means that the norm of the gradient V6 Ptfl (p k ) decreases with an order being 
O ( \ J . In order to achieve for the primal optimization problem an accuracy which is 

proportional to e via the estimation (p]), we need k = O (-%) iterations. This conver- 
gence is slow as compared to our aimed rate of convergence of O ( - In ( - J J and it is 
not better than the rate of convergence of the subgradient approach. 



From another point of view, in order to get a feasible solution to the primal opti- 
mization problem (P), it is necessary to investigate the distance between Ax PtPh and 
x„p k , since the functions / and g ° A have to share the same argument (which would 
be x PtPk , if ||V0 PiAt (pfc)|| = \\Ax PtPk - x PtPk \\ = 0). Therefore, the norm of the gradi- 
ent ||V#p iAt (pfc)|| is an indicator for an approximately feasible solution. Thus, in order 
to obtain an approximately optimal solution to (P), it is not sufficient to ensure the 
convergence for 9(pk) + v(D) to zero, but also a good convergence for the decrease of 
[|V^(p*)||. 

3.2 Second smoothing 

In the following a second regularization is applied to 9 PtfM , as done in J8J[9], in order to 
make it strongly convex, fact which will allow us to use a fast gradient scheme with 
a better convergence rate for [|V0/>ii[|. Therefore, adding the strongly convex function 
| || • || to 9 P)P for some positive real number k gives rise to the following regularization 
of the objective function 



%^ K : R m -► M, 9 p ^ K (p) := 9 p , p (p) + - \\pf = f*JA*p) + g*J-p) + - \\p 



2 



which is strongly convex with modulus k > (cf. 1 10, Proposition B.l.1.2]). We further 
deal with the optimization problem 

inf Pl/vs (p). (11) 

pgR m »'"*' 

By taking into account pi Proposition A. 8 and Proposition B.10], the optimization 



problem (11) has an unique element. The function p , p , K is differentiable and for all 
p G R m it holds 

V^, M ,«(p) = V (0 p ,n(-) + - \\'\\ 2 J (p) = Ax PjP - x P)P + np. 
This gradient is Lipschitz continuous with constant L(p, //, k) := "—^ — \- - + k. 

4 Solving the doubly regularized dual problem 
4.1 An appropriate fast gradient method 



PiMA 



Denote by p* DS the unique optimal solution to optimization problem (11) and by 9* 
@p,fj.,K,(p*£)s) ^ s °pti ma l objective value. Further, let p* E W 71 be an optimal solution to 
the dual optimization problem (D) and assume that the upper bound 

\\p*\\<R (12) 

is available for some nonzero R £ R+. 



We apply to the doubly regularized dual problem (11) the fast gradient method [TT] 
Algorithm 2.2.11] 

Init.: Set w = p := G R m 
For k > : Set p k+1 := w k - — ^0 ppK {w k ). (13) 



Set w fc+ i := p fc+ i + = ^(Pfc+i - PA:)- 

y/L(p,IJ,,K) + V K 



By taking into account ITT1 Theorem 2.2.3] we obtain a sequence (pfc)fc>o f= ^ m satisfying 

o P ,^KiPk) - o* p ^ K < (e p ^ K (p ) - e*^ K + - bo -pdsII 2 ] ( i - J L , K A 

< (0 p ,,Apo) - e^, K + \ bo - P * DS \\ 2 >' k ^^ (14) 

< 2{9 p ^{ Po ) - e;^ je- fc v^(^y V fe > o, (15) 

while the last inequality is a consequence of (TT1 Theorem 2.1.8]. Since p* DS is the unique 
optimal solution to (11), we have V9 PjP:K (p*d S ) = and therefore fill Theorem 2.1.5] 
yields 

Tttj^ r \W0 P ,,AVk)\\ 2 < PlA .,*(Pfc) - 6 *p,^ ? WwM - ^,Je- fe V^ST, 

Z±j\p,H, K) 

which implies 

|| V V,«G>fc) f < 4L(p, fi, «)(^ jAllJ6 (po) - 0; iMi Je" fc v^&T Vfc > 0. (16) 

Due to the strong convexity of p , PtR with modulus k > 0, Theorem 2.1.8 in 111] states 

IIP* - Phsf < 6p,w(Pk) ~ 0* p ^, K T 2(^«(po) - ^,Je~ fc v 7 ^^ y k > 0. (17) 

Using this inequality it follows that (see also pM^l) 

4 



K 

2 



2 < min I linn - 7>*™ll 2 . -(0„ „ Jnn) - 0* .. V k \T^J, 



\\p k -p* DS \\ < min (bo -p* DS \\ , -(<W(Po) - ^,Je *V Mp,„«) | Vfc > 0. (18) 

We will show as follows that the rates of convergence for the decrease of ||V0pii(pfc)|| 
and 6(p k ) + v(D) are the same, namely equal to O ( ~ln f M J. This will us allow to 
efficiently recover approximately optimal solutions to the initial optimization problem 
(P)- 

4.2 Convergence of 0(pfc) to — v(D) 
Since po = 0, we have 



K llnl|2 

2 



W(o) = /;(o) + g ;(o) + - ||of = v(o) 



10 



and 



o,fj,,hi\P 



DS) 



%AP*D S ) + h\PDs\\ 2 



(19) 



and obtain 

K 



\\Pl 



DS\ 



W(o) - OmMs) = <W(o) - OpMs) - o Wpl 



DS\\ ) 



which implies that 



?Dsf<l(9pA0)-9p,f><jfDs))- 

Ki 



(20) 



In addition, for all k > it holds 

2 



iPfc-PDsl 



{Op^APk) - Q p ,pAp*ds)) 



ft 

1 ' W(o) - W(pi>s) + J [|o - pd 5 II z 1 <• 



ft 



2 \ - k y/ L( P "», K ) 



^?(W0)-^(Pis))e" 



v V £(p>m,*0 



(21) 



and 



WW 



Wfcbs) T (WW " Wfe) + f 110 " PD5II 2 ) e~Vzra 
+ ^(>Dsl| 2 -||rf) 

,(0) - OpAPns)) e~ k V^^ + K (\\ p * DS f - \\p k \\' 







'p,p\ 



(22) 



Investigating the last term in the estimate above, using |||p^ s || — \\pk\\\ < \\p*ds ~ Pk\ 
and ||p fc || = \\p k -p* D s+P*D,s\\ ^ Ibfc "PoS II + \\p*ds\1 we § et for a11 k - ° 

Ifef-llrf = (Ifell- fell) (IIPdsII + fell) 

< \\P*DS-Pk\\(\\P*Ds\\ + \\Pk\\) 

< \\phs-Pk\\(2\\ P hs\\ + \\Pk-P*Ds\\) 

3T 3\\PDS-Pk\\\\PDs\\ 

? 3||^sll ^(V(0) -^(PDs)) e " lv/ ^^ 



.(0)-fl w fe))e W^,*). 



Inserting this result into (22), we obtain for all k > 



^(Pfc)-V(pW<(^(0)-V(PD5))(e V^,«> + J= e *V^ 
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<T(W0)-^(Pcs))e' 



V L(p^, 



k) 



(23) 
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Further, we have O p ^(0) S #(0) and 

O p Ap*ds) f 0(p* DS ) ~ pD f - »D g > 9(p*) - pD f - pD g , 
and, from here, 

^(0) - e^iphs) < 0(0) - 6(p*) + pD f + pD g . (24) 

Finally, since p Ap*ds) < 6 p Ap*ds) + § Hp*^^ < O p Ap*) + § ||p*|| , we conclude that 

K 

2 



o p Ap*ds) < o p Ap*) + 'i \\p*\\ 2 ¥ 0(p m ) + *\\p*\\ 2 



and, therefore, for all k > 

e p ,M) - O p Ap*ds) f Oipk) - PDf - pD g - 6{p*) - I ||p*|| 2 
In conclusion, we obtain for all k > 



(25) 



0( Pk )-0(p*) 



pD f + pD g + - ||p* f + 6 P Apk) - o p Ap*ds) 



2 25 



PDf + flDg + -Br + 

pD f + pD g + ^ 2 
25 



(^(0)-^fe))e W^m-k) 



+- O -(0(O) -6(p*)+pD f + pD g )e * V / M^)". (26) 

8 

Next we fix e > 0. In order to get 0(pfc) + v(-D) < e for a certain amount of iterations 



/c, we force all four terms in (26) to be less than or equal to |. Therefore, we choose 

re := «(e) 



P:=P(<0 = 4^> A*:=/i(e) = ^ 



e 

2fl2" 



(27) 



With these new parameters we can simplify ( 26 ) to 



9{ Pk ) + v{D) <j + y (e(o) - e( P *) + e tv^ra. 



As we see, the second term in the expression on the right-hand side of the above estimate 
determines the number of iterations which is needed to obtain e-accuracy for the dual 
objective function 8. Indeed, we have 

e 25 



> 



k I 



2 V £( 



0(o) - e(p*) + - 1 c 
—>>-■ f (0(0) -*(?*) + 1 



2 V L(p,n,K) 



2VL{ P ,p,k)~ I 2e 



^ fe >2J L ^^ ln( 25 ^ (0) -^ ) + t) 



(28) 
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iterations. A closer look on (P '^' K > shows that 

K 

L(p,u,K) II All 2 1 , (f27l 8\\A\\ 2 D f R 2 8D„R 2 , 

yH P ' = ^^ + — + 1 W " " 9 J — + — f— + 1 

ft /9K /Uft e z e z 



hence, in order to obtain an approximately optimal solution to (D), we need k = 
O ( \ In ( - J J iterations. 

4.3 Convergence of ||V# PiAt (pfc)|| to 

As it follows from (IS]) , guaranteeing e-optimality for the objective values of 6 is not 
sufficient for solving the initial primal optimization problem with a good convergence 
rate in the absence of a similar behavior of ||Wp lA4 (pfe)|| = ||^4x PiPfc —x^ :Pk \\. In the 
following we show that the fast gradient method (13) applied to the doubly regularized 
function P ^ K furnishes the desired properties for the decrease of ||V# PiA j(pfc)|| (see also 
|8||9]). Since' 

Ibfell = \\Pk - P*DS + P*DS\\ < \\Pk-P*Ds\\ + \\P*DS\\ ^ 2 \\P*Dsh 

we have 

||V6> P)jlt (pfe)|| = \\yO P ^{pk) + i^Pk - «Pfc|| = ||V6» p>Ai>K (p fc ) - npk\\ 

< [|V0 Pl/V e(pfc)ll + ||«Pfe|| = ||V^, re (pfc)|| + ft fell 

< \\V0p,w(Pk)\\ + 2k \\p*ds\\ yk > 0- (29) 
Having a closer look on the first term in the previous estimate one can notice that 



VVfe)f V 4L(/9,/x,ft)(^ iK (0) -e Ptfl , K {p* DS ))e fe V^S 



4L(/9,/x,ft)(V(0)-V(^s))e V^, K > 



® 4L(p,/u,ft) (fl(0) - 0(p*) + e^v 7 ^ 
thus, 



iti,«) 



||V^,«(p fe )|| <2JL(p,//,K)^(0)-e(p*) + |je Vto) VA:>0. (30) 
Furthermore, in order to gain an upper bound for the norm of Ppg, we n °tice that 

0(p*) + ^\\p*f f V(P*) + fll/l| 2 >V^5) + fl|P*D5l| 2 

fft 2 

- 0(Pds) - P D f ~ P D g + 2 WpdsW 

> e(p*)-pD f -nD g + ^\\ P * D3 \\ 2 , 

13 



which implies | HPjbsll — f \\P*\\ + P&f + A^g or J equivalently, 

ii * ii2 ^ ii *|i2 , 2p . 2^t 

IIpdsII < Up II + v jD / + V^- 

Hence, 



"121 



2k 2k 



\p*\\ 2 + 2R 2 



V3R, 



(31) 



which, combined with (29) and (30), provides the following estimate for the norm of 
the gradient of 0p !fl (pk) f° r & > 



V^,,(>,)|j <2\/Z(/>.//.K)(0(O)-0(p*) + !)e *V i.,-,-.-. + 2y :!;;./? 



^ 



2\/Z(/>. /,.„■)( 0(0) -0(p*) + |)e Wmp:m)+^. (32) 



For e > fixed, the first term in (32) decreases by the iteration counter k, while, in 



order to ensure that ||W PiAt (pfc)|| < M, we have to pass 



I > 2^L(p,/x,«)(0(O)-0(p*) + !)e^ 



l\/i(p#,«) 4- ^__ l 



i? 



O 



(2 - \/3)e 



i? 



44> 



.AA, g2 V L(p,ii, K ) > 



2i2 v /i(^/ i ,K)(e(0)-fl(p*) + f) 
(2 - \/3)e 



2 V L(p,n,n) 



> In 



' v /4i? 2 L(p,/i, K )(0(O)-^*) + 
(2 - >/3)e 



ok> 2\ VF,P ' ; In I v 



(2 - V3)e 



^fc> - v /e 2 + 8i2 2 (P||^D / + L> s ) 



•In 

3 
e 



(2e 2 + 16i? 2 (||A|| 2 £>/ + A,))(0(O) - 9{p*) + 



(2 - V3)e 



^k> -\/e 2 + 8R 2 {\\A\\ 2 D f + D g ) 



In 



• 3 /(2e 2 + 16R 2 (\\A\\ 2 D f + D fl ))(6»(0) - 0(p*) + §) 



(2- V3)3e 



(33) 



iterations of the fast gradient method (13). In the above estimate, we used that 
te) = 1 + *#(\\A\\ 2 D f + D g ) and L( Pi »,k) = ^f^ + ^ + ^ (see©). 
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Resuming the achievements in the last two subsections, it follows that fc = 0(-ln(-JJ 
iterations are needed to guarantee 

2e 



6(p k ) + v{D) < e and ||V^(p fc )|| < 



R 



(34) 



with a rate of convergence which is very similar except for constant factors. 

4.4 How to construct an approximately primal optimal solution 

Next, by making use of the approximate dual solution p k , for k > 0, we construct 
an approximately primal optimal solution for the initial problem (P) and investigate 
its accuracy. To this end we will make use of the sequences (x p , Pk )k>o C dom/ and 
( x v,Pk)k>o Q doing which are delivered by the algorithmic scheme (13). We will prove 



that, given a fixed accuracy e > 0, we are able to reconstruct an approximately primal 



optimal solution such that, for p and \x chosen as in (27), one gets 
\f(x p , Pk ) + g{x tl>Pk ) -v(D)\ < 2(1 + 2^3>, 
\\Ax, 



u p,Pk 



"M.Pfc 



< 2£ , 



(35) 
(36) 



in the same number of iterations as needed in order to satisfy d34j). Let k := k(e) be 
the smallest index with this property. By means of weak duality, i.e. v(D) < v(P), 



(35) would imply that f(x p>Pk ) + g(x fJijPk ) < v(P) + 2(1 + 2y3)e, which would further 



mean that x P:Pk S dom/ and x PtPk G dom g fulfilling (35) as well as (36) can be seen as 
approximately optimal and feasible solutions to the primal optimization problem (P) 
with an accuracy which is proportional to e. 



Now let us prove the validity of the inequalities above. As V6 Ptfl {pk) = Ax p . 



Ph 



b M,Pfe' 



relation (36) follows directly from (34). Thus, we have to prove only that (35) is true. 



To this aim, we notice first that, since 6 P:P (pk) + v(D) S 6(Pk) + v(D) < e and 



pD f - pD g + v{D) 



J27f 



e( Pk ) + v(D)--> 

„ ' 2 



■>o 



we have \9 P:P (pk) + v(D)\ < e. From (J7| it follows 



\f(%P,Pk)+9(x, 



p,Pk , 



v{D)\ 



< 

J27l 



llPfc 

2e 

~R 



V9 p , p { Pk )\\+e + pD f + pD g 
V0p,„(pfe)||+2e 

||Pfc|| + 2e 



Further, in order to get an upper bound for \\pk\\, we use that 

(18 



^? 2\\p* DS \\T 2V3R, 



\\Pk\\ = \\Pk + P*DS - P*DS\\ < l|Pfc-P!bsll + \\P*DS\ 

and, finally, we obtain 

\f(x P ,p h ) + 9(x p , Pk ) - v(D)\ < 4V3e + 2e = 2(2^3 + l)e. 
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4.5 Existence of an optimal solution 

In this section we will study the convergence behavior of the primal sequences produced 
by the fast gradient method converge to an optimal solution of (P) when e 1 0. Let 
(en)n>o ^ ^+ be a decreasing sequence of positive scalars with lirrin^oo e n = 0. For each 



n > we can make k = k(e n ) iterations of the double smoothing algorithm (13) with 



smoothing parameters p 6n , fj, en and K 6n given by (27) in order to have (34) satisfied. 
For n > we denote 

x n ■= x Pen , Pk(en) £ dom/ and y n := x MEn , Pfc(6n) G domg. 

Due to the boundedness of dom/ and dom g, there exist the subsequence of indices 
(ni)i> C (n) n > , xel™ and y G R m such that 

x ni — > x £ cl(dom/) and y ni — > y £ cl(dom5(). 



In view of relation ( 36 ) we obtain 

2f 



0<\\Ax ni -y ni \\<^, (37) 



for each I > 0. For / — > +oo in (137]) we get Ax = y. Furthermore, due to (35), we have 

f{x ni ) + 9{y ni ) < v{D) + 2(1 + 2V3)e n; VZ > 
and, by using the lower semicontinuity of / and g, we obtain 
f(x) + g(Ax) < liminf {f(x ni ) + ff (y^)} < lim \v(D) + 2(1 + 2^3)6^} = v(£>) < u(P). 

(— !>oo I— s>oo >- J 

By taking into account that v{P) < +cxd, it follows that x S dom/ and Ax £ dom 5, 
thus x is an optimal solution of the primal problem (P). 

5 An example in image processing 

In this section we are solving a linear inverse problem which arises in the field of signal 
and image processing by means of the double smoothing algorithm developed in the 
preceding sections. For a given matrix A £ M. nxn describing a blur operator and a given 
vector b representing the blurred and noisy image the task is to estimate the unknown 
original image x* £ W 1 fulfilling 

Ax = b. 

To this end we solve the following nonsmooth l\ regularized convex optimization problem 
(P) inUWAx-b^ + XWxU, 

xGS 

where S C M. n is an ra-dimensional cube representing the range of the pixels and A > 
is the regularization parameter. The problem to be solved can be equivalently written 

as 

(P) M{f(x) + g(Ax)}, 
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for/ :R n -+R, /(x) = A \\x\h +6 s (x) and 5 : M n ■ -> R, 5(2/) = \\y - b\\ 1 + 5 s (y) (one has 
that A(S) C S 1 , since for x G S the pixels of the blurred picture Ax have naturally the 
same range). Thus both functions / and g are proper, convex and lower semicontinuous 
and have bounded effective domains. 

Since each pixel furnishes a greyscale value which is between and 255, a natural 
approach for the convex set S would be the n-dimensional cube [0, 255] n C M n . In order 
to reduce the Lipschitz constants which appear in the developed approach, we scale all 
the pictures used within this section so that each of their pixels ranges in the intervall 

In this section we concretely look at the 256 x 256 cameraman test image, which 
is part of the image processing toolbox in Matlab. The dimension of the vectorized 
and scaled cameraman test image is n = 256 2 = 65536. By making use of the Matlab 
functions imf ilter and f special, this image is blurred as follows: 



H=fspecial ( ' gaussian ' , 9 ,4) ; % gaussian blur of size 9 times 9 

% and standard deviation 4 
B=imfilter (X,H, ' conv ' , ' symmetric ' ) ; % B=observed blurred image 

% X=original image 



In row 1 the function f special returns a rotationally symmetric Gaussian lowpass filter 
of size 9x9 with standard deviation 4. The entries of H are nonnegative and their sum 
adds up to 1. In row 3 the function imf ilter convolves the filter H with the image 



X G 



x256 



and outputs the blurred image B G 



d256x256 



The boundary option 



"symmetric" avoids dark edges for the blurred picture B which normally appears after 
a convolution (provided that X and B have same dimensions). 

Thanks to the rotationally symmetric filter H, the linear operator A G M nxra given 
by the Matlab function imf ilter is symmetric, too. Since each entry in B can be seen 
as a convex combination of elements in X with coefficients in H, we have A(S) C S. 
The norm [|A|| is not explicitly given and is estimated by 1. After adding a zero-mean 
white Gaussian noise with standard deviation 10 -4 , we obtain the blurred and noisy 
image b G K n which is shown in Figure 5.1 



original 



blurred and noisy 





Figure 5.1: The 256 x 256 cameraman test image 
One should also notice that, as both functions occurring in the formulation of (P) 
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are nondifferentiable, the classical iterative shrinkage thresholding algorithm and its 
variants (see f2|3l7) ) cannot be taken into account for solving this optimization problem. 
Indeed, in this situation the double smoothing technique is our first choice for solving 
(P) with an optimal first-order method. 

The dual optimization problem in minimization form is 



inf {f*(A*p)+g*(-p)} 



(D) 



and, due to the fact that x' := 4,1" G ri(5 r ) n A(ii(S)), it has an optimal solution (see, 



for instance, pip])- By taking into consideration (27), the smoothing parameters are 
taken 



/! 



4D/ "~ 4D g ' 2R 2 



(38) 



for Df = D g = sup < ^y- :i£ 0, ^ > = 327.68 and R = 0.05, while the accuracy is 
chosen to be e = 0.01. 

In the following we show that the proximal points can be exactly calculated in each 
iteration of the algorithm, due to the fact that they occur as optimal solutions of some 
separable convex optimization problems. Indeed, since for k > 



'•f 



A*w k 



inf lf(x) + ^ 
xm n 1 2 



A*w k 



■ f J Ml II , P 
mi < A \\x -, + - 



A*w k 



the proximal point of / of parameter - at — ^ fulfills 



x p .w k = arg min < V 
*e=[0,£]" Ul 



A la;; I + 



P f(A*w k ) 



2 V P 



and its calculation requires the solving of the following one-dimensional convex opti- 
mization problem for i = 1, . . . , n: 



inf 



Xx l + P 2 {^^-x, 



which has as unique optimal solution Vr _ii ( - ((A*w k )i — A) J . Thus, 

1 



x P ,w k — ^[o,i] 



P 



(A*w k - Xt r 



On the other hand, since for k > 



i 
"9 



P 



inf < q(x) H — 



inf V 

*6 W \U 



Wk 
P 



bi\ + 



P 



inf < \\x — blli H — 

2 






(Wfc) 

P 
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the calculation of the proximal point of g of parameter — at -^ requires the solving of 
the following one-dimensional convex optimization problem for i = 1, . . . , n: 

inf { \xi - bi\ + — f - Xi 



**6[o.to] 



/' 




F 10Q = 36.462875 



F 200 = 14.359078 





F 50Q = 4.254065 



jW^Kk. 






jtii [1"^- 






jH IBl/ 






^KSl'Hj i/l \ 








tatf** 









Figure 5.2: Iterations of the double smoothing algorithm 
For a fixed k > we consider for i = l,...,n the function hi : M 



t, hi{z) 



H ( {wk)i 



For for i = 1, ..., n the optimal solution of the above problem 



is the projection of the unique global minimum (cf. |4, Proposition A. 8 and Proposition 
B.10]) Zi oihi on [o, j$ 



. For i = 1, ..., n we have 

2\ 



G 5/ii(zi) = 9 



&/I + 



M / Ofc)j 



(zj) = <9(|- -bi|)(jsi) -/i 



(wfe)i 



2 \ fj, ) l ' \ n 

which is equivalent to 

1 + fxzi : z t > bi 

-(w k )i 6 <9(|- - bi\) (z^ + \izi = { [- I + /jh, 1 + fibi] : z { = bi . 

-1 + fizi : Zi <bi 
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Hence, the unique global minimum zi can be calculated as follows 

(wk)i + 1 



zi = < 



h 

1 ~ {Wk)i 



: (w k )i < -1 - fibi 

■ -1 - fJ-bi < (wk)i <1 — fibi . 

: (wk)% >1- fibi 



All in all, the proximal point of g of parameter - at — — is for z = (z\, ..., z n ) T given 
by 



'H,Wk — i o 



^ro ir ( z ) 



Cameraman 



10 



10 J 



10C 



10 



10" 



10" 



10" 



10" 



K 




100 



200 



300 
Iterations k 



400 



500 



600 



Figure 5.3: Convergence to an approximately optimal and feasible primal solution 
The iterations 50, 100, 200 and 500 of the double smoothing iterative scheme are 



shown in Figure 5.2 for A = 2e-6 and Fk := f{xp,p h ) + g(Axp }Ph ). The decrease of Fk and 



\\ At 

the latter as well. 



Xfj, tPk || can be seen in Figure 5.3 The function values of —0{pk) are shown in 



6 Conclusions 

The subject of this paper can be summarized as a development of a first-order method 
for solving unconstrained nondifferentiable convex optimization problems in finite di- 
mensional spaces having as objective the sum of a convex function with the composition 
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of another convex function with a linear operator. The provided method assumes the 
minimization of the doubly regularized Fenchel dual objective and allows to reconstruct 
an approximately optimal primal solution in O ( - In ( - ) J iterations which outperforms 
the classical subgradient approach. 
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